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Abstract 

A method based on the classical principal component analysis leads to 
demonstrate that the role of co-authors should give a ft-index measure to a 
group leader higher than usually accepted. The method rather easily gives 
what is usually searched for, i.e. an estimate of the role (or ’’weight”) of 
co-authors, as the additional value to an author papers’ popularity. The 
construction of the co-authorship popularity 7t-matrix is exemplified and 
the role of eigenvalues and the main eigenvector component are discussed. 
An example illustrates the points and serves as the basis for suggesting a 
generally practical application of the concept. 

PACS: ... 

Keywords: ... 


1 Introduction 

The /i-index value of an author results from the counting of his/her quoted 
publications [T], ranked according to their popularity (the most quoted paper 
gets a rank r=l, etc.), and is obtained by the rank value (h = r) such that the 
papers above that rank (r > h) have less citations than h. The /i-index has 
been invented to quantify an author impact, though it is rather a measure of an 
author paper productivity and/or popularity [21 S],- which maybe partially due 
to some paper content quality [5] or to co-author fame [6|. 

It has been much discussed what publication is (or has to be) considered, 
when measuring h. Sometimes book citations are not counted; sometimes, there 
is double counting, or sometimes two papers deposited on different websites are 
counted as different papers, or sometimes not; sometimes papers in proceed¬ 
ings are not (or sometimes are) counted as of equal value as those in classical 
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peer-review journals, etc mm- Therefore, h depends on the type of selection 
criteria and big data search engine: Google Scholar (Google), Web of Science 
(Thomson Reuters), Scopus (Elsevier) databases [TD] , etc. However, the follow¬ 
ing considerations apply to all search engine results. 

Many variants have been proposed in order to remediate several so called 
defects |11 | [14j of the original /i- measurement. Sometimes self-citations have 
been scorned upon m , sometimes not m- Another, considered very impor¬ 
tant, criticism has been about the counting of coauthors and their role H3-IM]. 
It has been often argued that the number of quotations of a paper should be 
weighted according to the number of coauthors of this paper, thereby reducing 
the h-index of an author having many co-authors, as in the profit(p) — index 
[35] . It is true that sometimes team leaders are unaware of the papers they have 
published. Sometimes there is complaisance co-authorship as well [36] . 

The present paper has for main aim to propose a practical and basically 
sound (i.e. physics-prone) way of remeasuring the h-index (keeping the same 
’’name” ( h ) and quasi similar notations as in the ”h- index” literature, for sim¬ 
plicity) of an author publishing with co-authors. It is argued that the original 
h- index, in fact, undervalues the role of co-authors. The following study and 
method, therefore, emphasize that the impact of a team leader, or more gener¬ 
ally co-authorship, is underrepresented by the classical h-index. 

The ’’theory arguments” seem to follow better from practical examples, in 
a deductive way rather than through an inductive presentation. The method¬ 
ology idea is based on the principal component analysis (PCA) method which 
aims at reducing the dimensionality of a data set, consisting of a large number 
of interrelated variables, while retaining as much as possible of the variation 
present in the data set. This is achieved by transforming the raw data into a 
new set of measures, the principal components (PCs), which are uncorrelated, 
and which are ordered so that the first few retain most of the variations present 
in all of the original variables. Here, the data set is the h-index values for 
authors, but considering that they can have 1, 2, 3, ... co-authors, forming 
teams. For these teams, one can calculate also the corresponding h- index, in a 
usual way. This leads to write down a square matrix with dimensions equal to 
the number of considered co-authors. To calculate the eigenvalues and eigen¬ 
vectors is next a classical matter. Then, the result leads the true measure of 
co-authorship popularity from the set of ranked papers of such co-authors. The 
example cases, illustrating the argument, are limited to a few co-authors, but 
could obviously be extended. Their finite size is wholly irrelevant; in fact,this 
allows a better comprehension of details. Other extensions are briefly discussed 
in the conclusion section. 


2 Methodology 

At the start, get the h-index (ha) for each authors (*), from some search engine, 
e.g., Google Scholar or Web of Science. The source is irrelevant, since it will 
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be seen that the method applies whatever the search engine. Of course, initial 
and final numbers will be different, but the discussion on whether some source 
is ’’better” than another is not part of the present development. 

Next, reduce the publication list of the i authors only to the joint papers by 
a couple of co-authors, e.g. i and j. i.e. Nij. Thus, after ranking these papers, 
one easily obtains the equivalent of the h- index, i.e. an hij, for the couple of 
authors, from the list of Nij papers. A warning: this list might include papers 
which have a ’’large” citation number, yet not large enough to have a rank 
lower than ha for author i (or j). Indeed, for the author i a paper might not 
be often quoted, whence have a rank higher than ha, although such a number 
of citations might be important enough to have a rank lower than the h- index 
for the ’’couple of authors”, i.e. hij. 

There is no need to emphasize that the citation lists should be taken from the 
same data base, for coherence purpose. A practical point can be also mentioned. 
It is useful to cross-check the lists, i. e. repeating the procedure stating from j, 
and obtaining hji, thereby observing that one truly obtains hjj = hji. 

Thereafter, define the co-authorship popularity H-matrix, having hij as its 
off-diagonal elements and has ha on the diagonal. The order i is irrelevant. 
However, for a discussion, it seems appropriate to rank the authors i according 
to their ha value. In so doing, hij (or hji ) < hjj < ha. This matrix T~L differs 
from the co-occurrence matrices introduced in 123 ESI which only consider the 
frequency of partnerships. 

Finally, calculate the eigenvalues and eigenvectors of hi. For emphasizing 
the partner weights, the lowest component of the eigenvector corresponding to 
the largest eigenvalue has always been imposed equal to 1. 

3 A real case 

Consider the following (i = 1,2, 3,4) co-authors : MAU, PCL, APE, and JPE, 
respectively, having worked in statistical mechanics independently, together, or 
with various co-authors. A few characteristics of their publication lists is given 
in Table [L] Next, take the whole publication list of each author, e.g. from 
Google Scholar, without any loss of generality for the argument. 

For the present case, one obtains (six) 2x2 matrices; the same procedure is 
repeated for finding the matrix elements of the (four) 3x3 matrices, and for the 
unique 4x4 matrix. The number of joint papers is of course not increasing in 
this process. Recall that it seems convenient to order the authors (i = 1, 2, 3,4) 
according to their h-index. 

For space saving, the 'H-matrices of this example are displayed below, with 
on the same line, the relevant number of joint papers; the matrix eigenval¬ 
ues, but only the (unnormalized) eigenvector components corresponding to the 
largest eigenvalue (designated by W) are also given here below. For immediately 
emphasizing the partner weights, the lowest component of this eigenvector is im¬ 
posed to be equal to 1; also the index j of the component refers to the author 
rather than to its usual order when writing a vector. 
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The 6 matrices emphasizing links between two authors, among the 4 consid¬ 
ered, are 


hMAU,PCL = 


35 10 

10 11 


Ni, 2 = 30; 


=> = 38.620; \\, 2 = 7.380; = 2.765 x [ p = 1 


, ( 35 7 \ .. 

n-MAU,APE = I J 10 I ; PI 1,3 = 21; 

=> A' 1 ] = 36.827; A^ = 8.173; xP = 3.841 xP = 1 


hMAU,JPE = 


35 2 

2 2 


iVi,4 = 2; 


=> A^] = 35.121 A[ 2 1 = 1.879; xP = 16.63 xp = 1 


hpCL,APE = 


11 G ) • N 2 3 = 

6 10 I ’ 2 ’ 3 


=> A^a = 16.521; A^g = 4.479; xp = 1.087 xp ] = 1 


'2,3 


hpCL,JPE = 


11 2 
2 2 


N 2 a = 2; 


=> = 11.424; XPI = 1.576; xP = 4.702 xP = 1 


hAPE,JPE = ^ 2 2 ) ’ ^ 3 ’ 4 = 2; 

=> A M = 10.472; A^ 2 1 = 1.528; xP = 4.230 xP = 1 

The 4 matrices emphasizing the links between three authors, among the 4 
considered, are 


35 10 7 

^1,2,3 = | 10 11 6 ] ; -/Vi )2 ,3 = 8; 

7 6 10 

\ 

=> A&g = 41.041; Ag.3 = 10.620; A$, 3 = 4.338; 

xP = 3.318; xp = 1.304; xp } = 1 
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/ 35 10 2 \ 

hi,2,4 =10 11 2 ; JVi,2,4 = 2; 

V 2 2 2 / 

=> A^ >4 = 38.799; A$ >4 = 7.626; A^ >4 = 1.575; 

x[ 1] = 13.388; = 4.888; x^ = 1 


=> 


/ 35 7 2 ^ 

/H,3,4 = 7 11 2 

V 2 2 2 ) 

A$ >4 = 37.064; A^ 4 = 9.369; 
x^ ] = 13.743; x^ = 


j ; JVi.3,4 = 2 ; 

Ag, 4 = 1.567; 
3.771; 4 1} = 1 


/II 6 2 \ 

^2,3,4 = ( 6 10 2 ; Ah,3,4 = 2; 

V 2 2 2 / 

=> Ag ;4 = 17.051; Ag >4 = 4.484; A$ >4 = 1.465; 

4 1} = 3.903; 4 1} = 3.605; 4 1} = 1- 
The matrix emphasizing the links between the four authors is 



f 35 

10 

7 

2 ^ 


ftl,2,3,4 = 

10 

11 

6 

2 

; -Ah, 2 ,3,4 = 2 ; 

7 

6 

10 

2 


^ 2 

2 

2 

2 ) 


41.277; A 4 ^ 2 , 3,4 = 10.921; 

\(3) 

a 1,2,3,4 

= 

4.339; 

Ai,2,3,4 = 1-463; 

4 1} = 11.539; 4 1} 

= 4.576; 

II 

d'co 

3.524; x { l ] = 1. 


N.B. Those 4 authors have only 2 papers in common. 


4 Case analysis and implications 

It can be immediately observed that the (here called) ’’average h- index” for 
MAU, resulting from having co-authored papers at least with PCL or with 
APE or with JPE, leads to a < ft >^ 1} = (38.62 + 38.83 + 35.12 ) / 3 = 37.52, 
instead of < ft >4 (= ftn) = 35. 

In the same line of thought, consider the (average) ft-index for MAU resulting 
from having co-authored papers at least with PCL and with APE or MAU with 
PCL and with JPE or MAU with APE and with JPE. It easily found that 
< ft >3° = (41.04 + 39.80 + 37.06 ) / 3 = 39.30. 
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co — authors i: 

MAU 

PCL 

APE 

JPE 

hii 

35 

11 

10 

2 

N. citations of most cited paper 

152 

127 

37 

7 

N. citations till h 

1113 

296 

224 

14 

N. coauthors 

317 

32 

46 

4 

N. papers with "best" coauthor 

155 

30 

21 

2 

N. publications (<2012) 

571 

34 

111 

2 


Table 1: Productivity characteristics of the 4 co-authors considered in the text 


The effective ft.-index value can be calculated, in a similar manner, for an¬ 
other author, e.g. APE, due to his partnership in this particular 4-member 
team. It is easily obtained that < h = (17.05 +10.62+7.626)/3= 11.77, 
instead of APE h^i— 11. 

Thus, one can emphasize that < h >^= 3 4 = 41.277 is not some ’’av¬ 

erage”, but is the truly effective value for MAU due to publishing (and being 
quoted) when participating in this group of 4 co-authors. 

Moreover, the largest principal component is also giving some relevant in¬ 
formation on the relative impact of a co-author. It is sufficient to normalize 
the vector components indeed and consider the absolute weights. For exam¬ 
ple, for MAU in the 4-member team, the largest PC is found equal to 11.539/ 
y^ll.539 2 + 4.576 2 + 3.524 2 + 1) ~ 89%. This results in the effective h due to 
team partnership being equal to (41.277 x 0.89 ~) 36.80; in contrast to the raw 
value 35 which is not taking into account various co-authorships. 

The output due to the eigenvector components as indicators and measures 
of the respective weight gains and losses is postposed to Sect. 15.21 for better 
emphasis of the method interest. 

Note that the argument on the proportionality factor can be applied to each 
level or participation, considering sub-groups of co-authors. 


5 Two other cases 

A reviewer of the initial version of this paper claimed that the conclusion is 
based on the use of a very small sample of f authors, which in addition is not 
arbitrary since it includes the author himself. So contrary to the author claim 
that he uses ”an arbitrarily selected example, but without loss of generality ”, he 
is using a specific sample not at all arbitrarly (sic). At least a few more cases 
should be treated to provide a more solid basis to the idea. 

Therefore, two other cases are outlined, though without going through the 
complete details as above. 
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5.1 Extended ’’real case” 


Consider two other authors, (i = 5) JMK and (i = 6) DAH, having worked 
with the previous 4 authors, but not all of them. Therefore a few off-diagonal 
elements are necessarily vanishing. N.B. Those 6 authors have no paper in com¬ 
mon: 6 = 0- Moreover, for comparison with the above, the authors have 

not been ranked according to their h —index when writing the matrix emphasiz¬ 
ing the links between the six authors 


35 10 7 

10 11 6 

. 7 6 10 

" b -. 6 - 2 2 2 

3 3 3 

\ 2 2 0 

=> 6 = 42.232; A^ 2) 6 = 17.207; 


2 3 2 \ 

2 3 2 

2 3 0. 

2 10 ’ 
1 9 2 

0 2 17 J 

a! 3) 6 = 12.262; 



6.658; A^ 5) 6 


4-188; A . 6 


1.452, 


with 


A 1 ) 


x{ ' = 11.167; x ( 2 1} = 4.577; x^ 1 = 3.513; 


A 1 ) 


A 1 ) 


= 1.0; = 1.859; xY> = 1.397. 


(i) 


It is noticed that the main author weight goes down from 11.539 to 11.167. 
This is due to the influence of the sixth author which has a /i6,6=17, in contrast 
to the 2nd author who has a smaller /i 2 , 2 = 11- Therefore the method allows to 
test the loss (or gain, in the previous section) of some author’s influence due to 
some co-author. 

5.2 Shortened ’’real case” 

Indications on the weight gains and losses are best described on a ’’shortened 
case”, i.e. when the number of components of the eigenvectors, or the matrix 
rank, is small. In contrast to a 6 member team, consider the real case with three 
(for concision) authors like MD, SG and AP who all have a rather high h^i (as 


7 



of Oct. 2014) such that the popularity matrix reads 


35 3 8 

3 30 0 

8 0 12 


flMD,SG,AP — 

=> ^md,sg,ap = 38.479; A 1 'md,sg,ap = 29-070; A \] d> sg,ap = 9-451; 

In this (real) case SG and AP have no paper in common. However, both 
other couples (MD, SG) and (MD,AP) have a few papers in common, although 
hey are often cited, whence the relatively small h \2 and / 113 . It is of interest to 
write the three (normalized) eigenvectors: 

x {1) = (0.303;-0.907; 0.293); 
x (2) = (-0.044;-0.321; 0.946); 
x {3) = (-0.952;-0.274; 0.137). 

Two points can be made as a brief conclusion of this subsection. On one 
hand, as before, the ’’highest h— ” author, i.e., i = 1 gains from the other two, 
/in = 35 —38.5, but the second (or middle one, in this case) does not loose 
much (/122 = 30 —29) in having no publication, whence of course no citation, 
with the third partner. On the other hand, it is well seen that the eigenvectors 
indicate and measure the respective weight gains and losses. 

6 Conclusions 

In summary, the above indicates that the effect of co-authors on evaluating 
the popularity of an author through the h- index method can be investigated 
through a principal component analysis method. Through an arbitrarily selected 
example, but without loss of generality, it has been proved that the h- index 
undervalues the role of the team, in particular on the team ’’leader”. Two 
other cases have served to indicate that the method can be applied in larger or 
smaller samples. It is found that an effective /i-index can be calculated from the 
co-authorship popularity 77-nratrix eigenvalues, through the selection of team 
partners, but also up to the whole team size. 

It has been remarked that the co-authorship popularity 77 -matrix is sensitive 
to the size of the team and to the own h- index of the various members, - as should 
be expected, but also on the joint h- index of co-author couples. The relative 
weight is therefore nicely measured when imagining team members influence, e.g. 
on a finished project. There would be no need for a posteriori (and previously 
rather non objectively) asking the weight of a co-author, as it is often done 
in some dossier evaluation (see also comments on such a consideration in the 
Appendix). 

An interesting application would occur when the ranking of teams with a 
leader is necessary in fund raising processes, but also in hiring and promotion 
processes when the team partnership capacity of an author has to be quantified. 
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The above demonstration seems easily applied through any web search en¬ 
gine, if, e.g., evaluation committees wish to consider a specific search engine. 

The approach seems much more realistic and valuable than the classical h- 
index and several variants, - the more so nowadays in presence of co-authorship 
inflation. The method, based on a standard physics approach, is fundamen¬ 
tally different from Schreiber’s h m [21] and the many variant (fractional [30], 
normalized (39], ...) estimates of the h-index (see Appendix). 

In that respect, a final note about the construction of the co-authorship 
popularity ^-matrix seems of interest. The usual co-authorship network con¬ 
siderations suggest that one could examine a not-necessarily symmetric matrix, 
but evaluate the hij elements, taking into account the order of authors, i.e. 
hij ^ hji. This would not change much the eigenvalues, but would modify the 
eigenvectors and the relative weights. However, it is well known that the order 
of authors obeys different criteria, with different justifications, in different scien¬ 
tific fields [5D|. In the main text, the position of the co-author in the list has not 
been a criterion. Nevertheless, this consideration could be easily implemented. 

In conclusion, it can be claimed that the PCA method rather easily gives 
what is searched for, i.e. an estimate of the role of co-authors, as the additional 
weighted value to the measure of an author research paper’s popularity within 
the h— index scheme. 
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Appendix 

This Appendix follows a remark by a reviewer: In addition, the author claims 
that ’’The method, based on a standard physics approach, is fundamentally dif¬ 
ferent from Schreiber’s h m \2Ifl and the many variant (fractional i,W\ I. normal¬ 
ized l3§j . ...) estimates of the h-index. ” without proving it. Indeed, he should 
compare those different methods using the same sample. 

The point is of interest, but the reviewer is misunderstanding the main 
point. There is no need for a long proof to show that something is obviously 
different: to any reader, the present method should appear to be (completely) 
different from these weighting the number of citations through the number of 
authors, in the cited references. The latter methods in fine lead to a decrease 
of the apparent h-index of the ” team leader”, while the present approach shows 
that its value should be considered to be increased instead, (because co-authors 
unduly take some part of the popularity value of the team leader)! 
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Nevertheless, discussion of the fractional aspect being much related to the 
present work, values of the fractional ha index, according to |24j and m can be 
calculated as an illustration. However, the comparison with all the normalized 
indices introduced in (39J or others discussed by Schreiber m is not going to 
be made here in order to keep the size of the appendix at a reasonable length. 
On the other hand, it seems sufficient, for the present point, only to consider 
the data of the Sect. I5.2l case. 

In summary, Schreiber [24j proposes to count each paper citation only frac¬ 
tionally according to the inverse of the number of authors. In doing so, the 
following results h\\ = 22, h 2 2 = 25, /133 = 5 are obtained for the Sect. I5.2l case. 
Note here that SG (i = 2) has rarely more than one co-author and is much pub¬ 
lishing single author papers, whence his fractional /i-index remains very high. 
Taking the off-diagonal element fractionalized as well into account, one reaches 

^md.sg.ap ~ 27.206; ^md,sg,ap = 21.185; A ^md,sg,ap ~ 3.609, all obviously 
smaller than in the non-fractionalized case, but also showing a similar evolution 
toward an increase of the ’’leader value” with respect to the other authors. 

In a more complicated way, Galam’s tailor based allocation (TBA) [30] gives 
a set of weight possibilities, - although without providing an optimum one. The 
weight of an author is (in short) related to his/her position in the co-author 
list, but is (expectedly and supposedly) decided by the co-authors. In order 
to get a finite size appendix here, a specific constraint has to be selected for 
finding the fractionalized hij. In order to contrast with the uniform distribu¬ 
tion in Schreiber’s approach, a practically admitted constraint can be imple¬ 
mented for this illustration. The choice of the weight given to an author at 
position p of a given q authors paper, i.e., g{p,q), is hereby made similar to 
that used in evaluating rules at FNRS (Fonds National de la Recherche Sci- 
entifique) in Belgium. Let the value of a paper be 2 q. For a paper with two 
authors, each one gets the same weight (1/q =50%). Otherwise, the first au¬ 
thor gets 50% of the weight; the last author gets 25%, the rest being equally 
divided between the other authors. It is obvious that the weight of the ” middle 
list” co-authors gets quickly small when their number increases. Implementing 
such a rule leads in the Sect. 15.21 case to /in = 20, /122 = 24, /133 = 6. Tak¬ 
ing the off-diagonal element, fractionalized as well, into account, one reaches 

x\ md,sg,ap = 24.339; A \Jd,sg,ap = 18-300; X { md,sg,ap = 5.362. Again, ob¬ 
serve the strong effect of publishing with only a few co-authors and taking the 
first or last place in the list. 

It is worth as a conclusion to quote from |30j : the TBA rescaling disadvan¬ 
tages senior authors who usually sit last with several co-authors .... (while) ... 
a low citation paper does contribute mainly to first and last authors. It could be 
added that according to the here called FNRS rule or constraint implemented 
as in the TBA, it is not very interesting to be one among many co-authors, - the 
more so if the paper is not often cited, and if the position in the list is toward 
the middle. Fortunately, the present approach corrects a little bit the drastic 
rule. Nevertheless, this indicates that the a priori choice of the g(p, q) weight 
also much influences the final values, as expected, - but the index evolution is 
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again the same: the ’’leader” gets a higher hu value to the detriment of his/her 
colleagues. 

Notice that as in [32], the present method does not need a rearrangement 
of the citation records in contrast to the fractional methods, and ... it is not 
sensitive to extreme values of the number of co-authors ... cannot decrease when 
the number of citations increases, and ... its construction does not push highly 
cited papers out of the core. 

In conclusion, it seems obvious that these approaches ([23], [30], [30], ...) 
giving an a priori weight according to the position in a co-author list are different 
from the present one. . 
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